Goto

Collaborating Authors

 imitation-projected programmatic reinforcement learning



Reviews: Imitation-Projected Programmatic Reinforcement Learning

Neural Information Processing Systems

This paper addresses the problem of learning programmatic policies, which are structured in programmatic classes such as programming languages or regression trees. To this end, the paper proposes a "lift-and-project" framework (IPPG) that alternatively (1) optimizes a policy parameterized by a neural network in an unconstrained policy space and (2) projects the learned knowledge to space where the desired policy is constrained with a programmatic representation. Specifically, (1) is achieved by using deep policy gradient methods (e.g. DDPG, TRPO, etc.) and (2) is obtained by synthesizing programs to describe behaviors (program synthesis via imitation learning). The experiments on TORCS (a simulated car racing environment) show that the learned programmatic policies outperform the methods that imitate or distill a pre-trained neural policy and DDPG.


Reviews: Imitation-Projected Programmatic Reinforcement Learning

Neural Information Processing Systems

While the reviewers generally support acceptance, some concerns remain. We strongly encourage the authors to consider and address the concerns raised by the reviewers, as there remains room for improvement. While the paper is borderline due to these concerns, it falls on the side of acceptance due to the general support and strong support from reviewer 2.


Imitation-Projected Programmatic Reinforcement Learning

Neural Information Processing Systems

We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for such policies remains a challenge. Our approach to this challenge - a meta-algorithm called PROPEL - is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches.


Imitation-Projected Programmatic Reinforcement Learning

Verma, Abhinav, Le, Hoang, Yue, Yisong, Chaudhuri, Swarat

Neural Information Processing Systems

We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for such policies remains a challenge. Our approach to this challenge - a meta-algorithm called PROPEL - is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches.